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The  Finite  Memory  Prediction  of  Covariance  Stationary 


Time  Series 

by  H.J.  Newton  and  Marcello  Pagano 

Institute  of  Statistics,  Texas  A  &  M  University 

and 

Harvard  University  and  Sidney  Farber  Cancer  Institute 

/  Summary 

An  algorithm  is  presented  for  conveniently  calculating  h  step  ahead 
minimum  mean  square  linear  predictors  and  prediction  variances  given  a 
finite  number  of  observations  from  a  covariance  stationary  time  series  Y. 
It  is  shown  that  elements  of  the  modified  Cholesky  decomposition  of  the 
covariance  matrix  of  observations  play  the  role  in  finite  memory  pre¬ 
diction  that  the  coefficients  in  the  infinite  order  moving  average  repre¬ 
sentation  of  Y  play  in  infinite  memory  prediction.  The  algorithm  is 
applied  to  autoregressive-moving  average  time  series  where  further  sim¬ 
plifications  are  shown  to  occur.  A  numerical  example  illustrating  the 
basic  points  of  the  general  algorithm  is  presented. 

Some  key  words:  Covariance  stationary  time  series:  minimum  mean 

square  linear  prediction;  modified  Cholesky  decomposition 
algorithm;  autoregressive-moving  average  time  series. 


1.  Introduction 

As  pointed  out  by  Whittle  (1963,  p.47),  the  calculations  required  to 
find  finite  memory  nredictors  for  covariance  stationary  time  series  are 
made  difficult  by  the  need  to  calculate  the  inverse  of  the  T  x  t  covariance 


matrix  of  the  observations. 


Thus  many  authors  (see  Box  and  Jenkins  (1970,  p.  126)  have  proposed 
using  approximate  infinite  memory  predictors  rather  than  finding  the 
exact  finite  memory  predictors. 

Pagano  (1976)  has  given  an  algorithm  for  finite  memory  prediction 
of  a  pure  moving  average  process  which  reduces  much  of  the  calculation 
in  the  general  algorithm.  All  (1977)  uses  a  well  known  result  to  reduce 
inverting  the  T  x  T  matrix  to  the  successive  inversion  of  smaller  matrices. 

The  purpose  of  this  paper  is  to  propose  a  general  algorithm  for  pre¬ 
diction  of  covariance  stationary  time  series  which  capitalizes  on  the 
special  structure  of  the  modified  Cholesky  decomposition  of  a  symmetric 
Toeplitz  covariance  matrix.  Section  2  contains  the  algorithm  as  theorem  1 
which  also  shows  the  analogy  of  the  algorithm  with  infinite  memory  pre¬ 
diction.  In  section  3  theorem  2  presents  the  results  of  applying  theorem  1 
to  autoregressive-moving  average  processes.  Finally  a  numerical  example 
is  presented  in  section  4  illustrating  theorem  1. 

2.  Finite  Memory,  Horizon  h.  Minimum  Mean  Square  Linear  Prediction  of 
Covariance  Stationary  Time  Series. 

Consider  a  zero  mean  covariance  stationary  time  series  (Y(t),  t=0 
with  autocovariance  function  R(v)  =  E(Y(t)Y(t+v) ) .  Then  given  observations 
Y(l),  ...,  Y(T),  the  horizon  h,  memory  T,  minimum  mean  square  error  linear 
predictor  Y(T+h|T)  of  Y(T+h)  is  given  by  that  linear  combination  of 
Y(l),  ...,  Y(T)  that  minimizes  E  (Y(T+h)  -  Y(T+h|T)K. 

Thus 

T 

Y (T+h | T)  -  l  A  (j)  Y(T+1-1) 
j-1  ’ 

where  h  “  (XT  h(D*  •••»  satisfies 
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PT  *T,h  *  ?T,h  *  where  ?T  h  =  ••••  R(h+T-D) 


an<^  ^  TOEPL  (R(0) ,  . ..,  R(T-l)),  i.e.  is  the  T  x  T  symmetric 
Toeplitz  matrix  having  (j,k)  element  R(|j-k|)  . 

Suppose  that  Y  is  purely  nondeterministic,  i, e. 


I  log  f  (tu)doi  J  >  0 


2“rrexp  ^  /  log  f  (u)) dco  ^  >0  where  f  is  the  spectral  density 

function  of  Y.  Then  Y(t)  can  be  represented  as  the  limit  in  mean  square 
of  an  infinite  order  moving  average  process,  i.e. 


Y(t)  =  l  Mk)  G(t-k)  (1) 

k=0 


where  e(t)  is  the  infinite  memory  horizon  one  error  in  predicting  Y(t) 

and  E(e (T) e (T-j ) )  =  6  of  for  all  integer  T  and  j ,  where  6  is  the  Kroneck 

J  J 

delta.  Also  the  horizon  h,  minimum  mean  square  error  linear  infinite 
memory  predictor  Y(T+h|T,  T-l,  ...)  and  prediction  variance  ,  are 
given  by 


Y(T+h | T,  T-l,  . ..)  =  l 


B0O(k)e( T+h-k)  , 


h-1 

l  3, 

=0 


(k)  . 


(2) 

(3) 


The  process  Y  being  purely  nondeterministic  also  means  that  its  auto¬ 
covariance  function  is  positive  definite.  Thus  for  all  T  we  can  form  the 
modified  Cholesky  decomposition  (Wilkinson  (1965))  *  LtDtLT  of  VT  where 

L^,  is  a  T  x  t  unit  lower  triangular  matrix  and  is  a  T  x  T  diagonal 
matrix.  An  important  property  of  and  is  that  they  are  nested  for 
increasing  T,  i.e. 

■  ir . 
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r 


lt  0 
*T  1 


.  D, 


°T  ° 


0  d„ 


Thus  the  (j,k)th  element  of  L T  will  be  referred  to  as  L^* 

The  following  theorem  shows  the  role  played  by  L^  and  DT  in  finite 
memory  prediction. 


Theorem  1 

Let  Y  be  a  purely  nondeterministic  covariance  stationary  time  series 

T 

with  covariance  function  R.  Let  r  ^  =  LTD^LT  be  the  modified  Cholesky 

T 

decomposition  of  the  covariance  matrix  of  =  (Y(l),  ...»  Y(T)).  Define 
ej  =  (e(l) ,  e(T) )  by  =Yf.  Then 


T+h-1 

a)  Y(T+h | T)  -  J  iT+h.tth-k  e<1+h-H 
k=h 


2  hr1  .2 


b)  o  =  E  (Y(T+h)  -  Y(T+h|T)}  -  l  Li 


T+h,  T+h-k  T+h-k 


C)  i}  LT,T-j  =  *-<J> 


ii)  lim  d_  =  a 

j  00 

T-#~ 


Proof  of  Theorem 

a)  Defining  the  T  x  T  permutation  matrix  to  be  a  matrix  of  zeros 

T 

with  ones  on  the  main  reverse  diagonal,  we  have  Y(T+h|T)  =  h  ^T  ^T 

where  rT  X  ,  *  r  ,  since  premultiplication  (postmultiplication)  by  PT 
x  -  l  ,n  i  ,n 

T  _  ^ 

reverses  row  (column)  order  of  a  matrix.  Thus  Y(T+h|T)  “  h  PT  PT  -T 

*  r!  .  P_  rl1  Pm  P_  Y_  *  pi  .  rl1  Y_  where  p_  .  *  P_  rT  .  since  ■  I_ 

■"Tjh  T  T  T  T  ■" T  ~T,h  T  *“ T  "“ljn  i  -l,n  i  l 

and  since  for  the  symmetric  Toeplitz  matrix  we  have  PT  * 


. 


-5- 


Thus  Y(T-Hi|T)  =  p*  L‘T  d"1  L'1  Yt  =  pj  h  L“T  d"1  eT  .  To  show  that 

T 

this  is  the  result  in  (a)  we  note  1)  p_  ,  =  (R(T+h-l) ,  R(h) )  is  the 

-T,h 

-T  -1 

last  row  of  F-,.,  without  its  last  h  elements,  2)  F  L  D_  =  L_  for  all  T, 
-  1-rn  Ill  l 

-T  -1 

and  3)  because  of  the  nesting  of  the  LT  and  PT*  b^  is  the  T  x  T  prin- 

-T  -1  -1 

cipal  minor  of  the  upper  triangular  matrix  58  rx+h  LT+h  * 

1111,8  (?T,h  LT  °T  ^k  =  ^T+h  LT+h  DT+hV+h,k  =  (?T+h  fT+h  LT+h)T+h,k 
=  LT+h,k  ’  proving  (a)- 

To  prove  (b),  note  that  h  =  R(°)  "  Ej  h  ^  -T  h  = 

"eT,h  ltT  ^t1  H1  eT,h  = R(0) "  5fh  dt  V  where  5.h =  eJ(h  ltT  dt1 

which  as  above  is  the  row  vector  (L_,^  ,  ,  L_.,  „)  .  Also  R(0) 

1  Tfl  9  X  1  *  U  f  X 

T  T+h  2 

"  rT+h,  T+h  =  (LT+h  nT+h  LT+h^T+h,  T+h  =  L^+h,k  \*  thus  Provin8  (b) • 

To  prove  (c)  we  first  note  that  multiplying  both  sides  of  (1)  for 
t  =  T  by  e(T-j)  and  taking  expectations  gives 

E  (Y(T)  e  (T-j  )  )  =  f5  <j)0* 


We  next  note  that 


E(e(T)e(T-j ) )  =  6  dT  , 


L 

E (Y(T)e (T- j ) )  =  l  L_  E(e(k)e(T-j))  =  L_  d 

1  1 »  K  1  *  1  J  1  J 


and  that  e(l)  =  Y(l),  e(t)  =  Y(t)  -  Y(t|t-1,  . 1),  t  -  2,  . T,  where 
the  notation  Y(tjt-1,  1)  makes  explicit  which  Yf s  are  used  in  pre¬ 

dicting  Y(t ) .  Then  by  the  stationarity  of  Y  we  have  that  E  (Y(T)e(T-j ) ) 

-  E(Y(T)  [Y(T-j)  -  Y(T-j  | T- j-1,  1)])  -  E(Y(0)  [Y(-j )- Y(-j  | -j-1 . 1-T)]) 

which  by  a  standard  martingale  convergence  argument  converges  to 
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E(Y(0)e(-j))  *  .  A  similar  argument  shows  d^  — >  o2  thus 

proving  (c). 

Thus  comparing  (a)  with  (2)  and  (b)  with  (3),  it  is  clear  that  the 

elements  of  L^,  and  are  playing  the  role  in  finite  memory  prediction 
2 

°f  and  in  the  infinite  memory  case,  while  (c)  makes  explicit 

the  connection.  In  the  next  section  we  describe  how  this  algorithm  can 
be  simplified  for  an  autoregressive-moving  average  process. 


3.  Application  to  Autoregressive-moving  Average  Processes. 

The  univariate  autoregressive-moving  average  process  (Y(t),  t=0,+l,  . ..} 
of  order  (p,q)  is  defined  by 


P  <1 

l  ot(j)Y(t-j)  =  l  3 (k) e (t-k)  ,  t  *  0,  +  1,  ... 
j=0  k=0 

where  a(0)  =  3(0)  ■  1,  and  E(e(t))  =  0  ,  E(e (t) e (t+v) )  =  6^a2  . 

P 

We  assume  that  the  zeros  of  the  complex  polynomial  g(z)  =  £  a(j)zJ 

1=0 

are  all  greater  than  one  in  modulus  so  that  Y  does  indeed  have  an  infinite 
order  moving  average  representation  and  that  (defining  Ry(v)  s  E(Y(t)Y(t+v) ) 


P 

l  a (j )  RY(j-v)  =  0  ,  v  >  q  . 
j~0 


T 

Then  given  a  realization  YT  =  (Y(l),  . Y(T))  from  Y(’)  we  define 
the  following  quantities: 


i)  r  *  TOEPL  (R  (0),  ...»  R  (T-l)  where  Z(-)  is  an  autoregressive 

b  I  1  6  Lt 

process  of  order  p  with  coefficients  a(l),  a(p)  .  Thus  Z(*)  is 

referred  to  as  the  autoregressive  part  of  Y. 


ii)  ^  =  (X(l), 


X(T))T  = 


VZ,T  ~T  Where 


Z,T 


1*2  fl'z  j 


Z.T 
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Then  it  is  well  known  that  the  j  row  of  V  is  given  by 

Z ,  T 


(l,  oT_x)  j  =  1 

— ^  ( j  ”1)  *  •••»  ^  *  ^T—  j  ^  *  J  —  2,  .  p 


( -pi- ^  *  •  •  •  t  y  1>  Qrj._ j  )  9  3  $  •  •  •  y  T 


where  £  a,  (£)  R  (£~v)  =0,  v  =  1,  . . . ,  k  <  p  . 

1=0  k  Z 

Thus  there  are  only  p(p+l)/2  distinct  elements  of  V  (other  than 

^ ,  1 

0  and  1)  and  a  ^  (k)  ,  1  £  k  £  j  <  p  are  easily  obtained  from  a(l),  a(p) 

by  performing  Durbin's  recursive  algorithm  (1960)  for  decreasing  j. 

i±i)  rx,T  =  E(?T?T)  =  VZ,T  rY,T  VZ,T  where  rY,T  =  TOEPUR^O)  , .  . .  .^(T-l)  )  , 

P  q 

Since  for  t  >  p  ,  X(t)  =  £  a(j)Y(t-j)  =  £  6(k)e(t-k)  ,  we  have  for 

j=0  k=0 


J-k  »  »•  <rx,T)jk  '  "x<lj-k|) 


where 


Rx(v)  «  j  a  l  B(k)6(k+|v|)  ,  |v|  <  q 


,  M  >  q 


Thus  rx  T  is  symmetric  band  Toeplitz  in  its  last  T-p  rows  and  columns 

T 

while  its  p  x  p  principal  minor  is  given  by  V  T  V  .  Thus  F 

» P  * » P  ^  *  P  X ,  I 

is  almost  the  T  *  T  covariance  matrix  of  a  pure  moving  average  process. 
iv)  ?X,T  =  LX,T  °X,T  LX,T  ’  Since  ^rx,T^j,k  =  °  for|J“kl>  ^  » 


then  Lx  j  k  *  ®  also  for  j-k  >  q  .  Since  Lx  ^  is  nested  for  increasing 
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T  we  refer  to  the  (j,k)th  element  of  L  for  any  T  >  .j  ,k  as  L  . 

a,i  X, j ,k 

v)  The  vector  eT  =  (e(l),  ....  e(T))T  by  Lx  =  Xt  ,  l.e. 

max(j-l,q) 

e(l)  =  X(l)  ,  while  e(j)  =  X(j)  -  £  L  .  ,  .  e(j-k)  . 

k=1 

Since  V  and  L  are  nested  for  increasing  T  then  so  are  X  and 

1  A  9  1  ^  1 

eT,  i^e.  x!J+1  =  (Xj,  X(T+1) ) ,  e*+1  =  (£,  e(T+l>)  . 

With  these  quantities  defined,  the  algorithm  is  contained  in  the 
following  theorem: 


Theorem  2: 

a)  X (T+h | T)  -  X(T+h | T)  - 


P 

l  a(j)  Y (T+h- j | T ) 
j-1 


where 


r 


i)  X(T+h| T)  = 


l  L 


k=h 


X,T+h,T+h-k 


e(T+h-k) 


h  =  1, 


^  0  h  >  q 

ii)  Y (T+h-j |T)  =  Y (T+h- j )  if  j  >  h 

b)  a2  h  =  E  {Y (T+h)  -  Y(T+h|T)}2 

h-i  2 

£  ^VZ,T+h  LX , T+h ^ T+h ,T+h-k  dT+h-k 

c)  i)  Lx  T  T_k  —*  8(k)  ,  k«l,  ....  qasT-#« 

U>  (Vz!t  LX,T>T,T-k  -*  B.<k>  M  T  ^  " 

iii)  Let  y(0)  *  1,  y  (1),  Y  (2),  . ..  be  the  coefficients  of  the 


infinite  order  moving  average  representation  of  the  auto¬ 
regressive  part  of  Y.  Then 


-9- 


(!rt,p.H-'<l>  •  k-°'  1 . J-° 


lv>  Vz!p+j,k  "  J1“(OVp+j-C.k  ’  k"L . P~1 


Proof 

Since  rY  T  =  TOEFltt^O),  ....  ^(T-D)  -  V‘|T  rx>T»^T 
-1  T  -T 

=  VZ  T  LX  T  DX  T  LX  T  VZ  T  ancl  the  Cholesky  decomposition 

is  unique,  then  (b)  and  (c,ii)  follow  immediately  from  Theorem  1. 

Also,  Theorem  1  shows  that  the  elements  of  the  rows  of  V  ^  are  con- 

z ,  1 

verging  to  the  infinite  moving  average  representation  of  the  autore¬ 
gressive  part  of  Y.  Thus  the  rows  of  L  are  converging  to  the  infinite 
moving  average  representation  of  the  moving  average  part,  that  is  to 


3(1),  B(q)  . 

To  prove  (a),  note  that  since  X  *  V  ,  we  have  p 

i  z. ,  i  ^  j.  -w 

'VW)  ■  vZiT  eYjTjh 


XY,T,h 


Also  for  T+h  >  p  ,  T  h 


=  E(?  [X(T+h)  -  l  a ( j )  Y (T+h- j ) ] )  =  p 


j-1 


~X,T,h 


l  a(j)p 

j=l 


XY,T,h-j  ’ 


=  ex,T,h  -  a<J>  vz,T  eY,T,h-j  *  where  ex,T,h  =  E(V(T+h))-  Thus 


^Y,T  h  ^ Z  T  ~XY  T  h  ^Z  T  ~X  T  h  y  h— j  * 


Therefore,  Y(T+h | T)  = 


a(1)eY,T,h-j 


T  T  -T  -1 

~Y,T,h  Z,T  LX,T  X,T  ?T 

I-T  -1  T  -T 

X,T  X,T  ~T  ~X,T,h  X,T 


T  -T  T  -T  -1 
-X,T,h  Z,T  X,T  X,T  X,T?T 


°X,T  -T 


-  I  a(j)  Y (T+h- j | T)  . 

j-1 
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An  argument  identical  to  that  used  in  the  proof  of  Theorem  (la)  proves 

part  (i)  of  (2a) .  To  verify  (ii)  we  substitute  for  e  and  X 

-1  -  T 

to  obtain  V^jT  l£t  D"*t  ;t  -  pJ.T.h-j  r^TLT  -  «™-d>  , 

T 

since  pY  ^  h_^  is  the  (T+h-j)th  row  of  T. 

To  prove  (2ciii)  we  note  that  comparing  coefficients  of  like  powers 

P  oo 

of  z  in  the  equation  1/  I  a(j)z^  =  £  y(k)z  gives  the  following  recur- 

j=0  k=0 

k 

sion  for  y:  £  a(£)y(k--£)  =  6,  ,  k  >_  0.  Thus  we  need  only  show 
1=0  k 

k  _1 

J0°(£)V^,P+j,P+j-(k-£)  =  6k  ’  k  =  0,  j  .  But  the  left  hand  side 

of  this  equation  is  just  the  (p+j)th  row  of  V  ^  times  the  (p+j-k)th 

^  >  1 

column  of  V„  .  Finally,  (2civ)  follows  by  multiplying  the  (p+j)th 
^  f 

row  of  V  times  the  kth  column  of  V  for  k  =  1,  ....  p-1  . 

Z  Z 

o 

From  Theorem  2  we  see  that  to  find  Y  (T+h  I  T)  and  oZ,  ,  for  h  =  h-  , 

1 ,  h  1 

h^  and  T  =  T^,  . T2,  one  essentially  needs  to  calculate 
VZ,T2+h2  ,  LX,T2+h2  *  and  °X,T2+h2 


Theorem  (2ci)  shows  that  there  are 


q  nonzero,  nonone  elements  in  a  row  of  L  and  that  these  elements  are 
converging  to  the  coefficients  of  the  moving  average  part  of  Y.  Theorem 
(2ciii)  shows  that  only  the  first  p-1  elements  of  rows  of  V ^  , 

are  not  one  of  y(0),  y(l)»  while  (2civ)  shows  these  elements  are 


easily  calculated  recursively.  Thus  the  number  of  elements  in  V. 


r-l 

Z  »T2+h2 

and  L  that  need  to  be  calculated  and  stored  in  a  computer  program 

•**>  1 2^0  2 

increases  linearly  with  the  number  of  rows  needed  prior  to  attaining 


convergence.  This  convergence  is  illustrated  in  the  next  section. 


-II- 


4.  A  Numerical  Example 

Consider  the  autoregressive-moving  average  process  Y  of  order  p  =  4 
and  q  ~  3  with  a(l)  =  -.3357,  a(2)  =  .0821,  a(3)  =  .1570,  a(4)  -  .2567, 

6(1)  =  -.6077,  6(2)  =  .0831,  6(3)  =  .1903,  and  o2  =  1.  Then  the  variances 
and  first  10  autocorrelations  of  Y,  (denoted  py(*))*  the  autoregressive 

part  of  Y  (denoted  P^  (• ) )  »  and  the  moving  average  part  of  Y  (denoted  Py( ' ) 
are  given  in  Table  1,  while  Table  2  gives  the  first  10  terms  in  the  in¬ 
finite  order  moving  average  representation  of  Y ,  Z,  and  W. 


Table  1.  Variances  and  First  10  Autocorrelations  py(0,  P^O)* 

Y,  autoregressive  part  of  Y,  and  moving  average  part  of  Y  where 
Y  is  the  above  ARMA  (4,3)  process. 


v  Py(v)  Pz(v)  Pw<v) 


1 

-.2227 

.3806 

-.4548 

2 

-.0749 

-.0112 

-.0230 

3 

.0616 

-.2897 

.1347 

4 

-.1949 

-.4128 

0 

5 

-.0015 

-.2107 

0 

6 

.0250 

.0115 

0 

7 

.0233 

.1603 

0 

8 

.0560 

.1919 

0 

9 

.0134 

.1036 

0 

10 

-.0102 

-.0091 

0 

Variance 

1.1306 

1.3891 

1.4124 
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